Basic Quantitative Characteristics of the Modern Greek Language Using the Hellenic National Corpus

نویسندگان

  • George K. Mikros
  • Nick Hatzigeorgiu
  • George Carayannis
چکیده

ModernGreek is oneof the least quantitatively studiedmodernEuropean languages and the goal of this paper is to fill this relative void. We use the Hellenic National Corpus (HNC), which is a growing corpus that currently includes 33 million words. The corpus and all the tools used in our work were developed by the Institute for Language and Speech Processing (ILSP). In this paper we focus on threemain areas: the lists of the 1000most commonwords and lemmas, word length and letter frequency.We alsomake some comparisonswith earlier work, in which we had used the previous 13 million word edition of the HNC.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative parameters in corpus design: Estimating the optimum text size in Modern Greek language

The aim of this paper is to investigate the major quantitative parameters related to the definition of the optimum text size in Modern Greek corpus development. Using the Hellenic National Corpus (HNC) (Hatzigeorgiu et al., 2000) as a reference point we estimated a number of critical statistical measures regarding feature counting in different text sizes. The results indicate that frequent ling...

متن کامل

Design and Implementation of the Online ILSP Greek Corpus

This paper presents the Hellenic National (HNC), which is the corpus of Modern Greek developed by the Institute for Language and Speech Processing (ILSP). The presentation describes all stages of the creation of the corpus: collection of the material, tagging and tokenizing, construction of the database and the online implementation which aims at rendering the corpus accessible over Internet to...

متن کامل

Tribalism & Racism among the Ancient Greeks A Weberian Perspective

Were the ancients Greeks “racists” in the modern sense of the term “racist”? The terms ancient Greek “proto-racism”, tribalism (and/or racism) are used here to denote the abstract, narcissistic notion that not only the non-Greek barbarians, but also certain ancient Greek tribes (like the Macedonians, the Boeoteans etc.) should be excluded from the Hellenic community, for they were considered to...

متن کامل

Metalanguage or bidialectism? acquisition of clitic placement by Hellenic Greeks, Greek Cypriots and binationals in the diglossic context of Cyprus

Acquisition of object clitics is one of the more investigated aspects of the largely understudied variety of Modern Greek spoken in the Republic of Cyprus. Previous studies on the acquisition of clitics in Cypriot Greek usually acknowledge that the linguistic reality in Cyprus involves a state of diglossia, where the sociolinguistically ‘high’ Standard Modern Greek co-exists with the ‘low’ Cypr...

متن کامل

A Finite-State Approach to the Computational Morphology of Early Modern Greek

We present a finite-state approach to the computational morphology of early Modern Greek that improves the efficiency of searching and accessing to the “Politimo” corpus, which consists of Greek documents printed during the 17th and 18th centuries. Computational morphologies provide users the ability to search documents using only a word root and locate all the corresponding inflected words. Ke...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Quantitative Linguistics

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2005